NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

A Performance-Driven Benchmark for Feature Selection in Tabular Deep Learning

Cherepanova, Valeriia; Levin, Roman; Gowthami, Somepalli; Geiping, Jonas; Bruss, C Bayan; Wilson, Andrew G; Goldstein, Tom; Goldblum, Micah (December 2023, Advances in Neural Information Processing Systems)

Academic tabular benchmarks often contain small sets of curated features. In contrast, data scientists typically collect as many features as possible into their datasets, and even engineer new features from existing ones. To prevent over-fitting in subsequent downstream modeling, practitioners commonly use automated feature selection methods that identify a reduced subset of informative features. Existing benchmarks for tabular feature selection consider classical downstream models, toy synthetic datasets, or do not evaluate feature selectors on the basis of downstream performance. We construct a challenging feature selection benchmark evaluated on downstream neural networks including transformers, using real datasets and multiple methods for generating extraneous features. We also propose an input-gradient-based analogue of LASSO for neural networks that outperforms classical feature selection methods on challenging problems such as selecting from corrupted or second-order features.
more » « less
Full Text Available
GOAT: A Global Transformer on Large-scale Graphs

Kong, Kezhi; Chen, Jiuhai; Kichenbauer, John; Ni, Renkun; Bruss, C Bayan; Goldstein, Tom (July 2023, Proceedings of Machine Learning Research)

Graph transformers have been competitive on graph classification tasks, but they fail to outperform Graph Neural Networks (GNNs) on node classification, which is a common task performed on large-scale graphs for industrial applications. Meanwhile, existing GNN architectures are limited in their ability to perform equally well on both homophilious and heterophilious graphs as their inductive biases are generally tailored to only one setting. To address these issues, we propose GOAT, a scalable global graph transformer. In GOAT, each node conceptually attends to all the nodes in the graph and homophily/heterophily relationships can be learnt adaptively from the data. We provide theoretical justification for our approximate global self-attention scheme, and show it to be scalable to large-scale graphs. We demonstrate the competitiveness of GOAT on both heterophilious and homophilious graphs with millions of nodes.
more » « less
Full Text Available
Explaining National Trends in Terrestrial Water Storage

https://doi.org/10.3389/fenvs.2019.00085

Bruss, C. Bayan; Nateghi, Roshanak; Zaitchik, Benjamin F. (June 2019, Frontiers in Environmental Science)

Full Text Available

Search for: All records